Sparse Dynamic Programming for Longest Common Subsequence from Fragments

نویسندگان

  • Brenda S. Baker
  • Raffaele Giancarlo
چکیده

Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt–Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, respectively) and a set M of matching substrings of X and Y , find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application to analysis of software systems. Our algorithm solves the problem in O M log M time using balanced trees, or O M log logmin M nm/ M time using Johnson’s version of Flat Trees. These bounds apply for two cost measures. The algorithm can also be adapted to finding the usual LCS in O m + n log + M log M time using balanced trees or O m+ n log + M log logmin M nm/ M time using Johnson’s version of Flat Trees, where M is the set of maximal matches between substrings of X and

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest Common Subsequence from Fragmentsvia Sparse Dynamic

Sparse Dynamic Programming has emerged as an essential tool for the design of eecient algorithms for optimization problems coming from such diverse areas as Computer Science, Computational Biology and Speech Recognition 7, 11, 15]. We provide a new Sparse Dynamic Programming technique that extends the Hunt-Szymanski 2, 9, 8] paradigm for the computation of the Longest Common Subsequence (LCS) a...

متن کامل

Efficient algorithms for the longest common subsequence in $k$-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...

متن کامل

A Specialized Branching and Fathoming Technique for The Longest Common Subsequence Problem

Given a set S = {S1, ..., Sk} of finite strings, the k-longest common subsequence problem (k-LCSP) seeks a string L of maximum length such that L is a subsequence of each Si for i = 1, ..., k. This paper presents a technique, specialized branching, that solves k-LCSP. Specialized branching combines the benefits of both dynamic programming and branch and bound to reduce the search space. For la...

متن کامل

New Tabulation and Sparse Dynamic Programming Based Techniques for Sequence Similarity Problems

Calculating the length of a longest common subsequence (LCS) of two strings A and B of length n andm is a classic research topic, with many worst-case oriented results known. We present two algorithms for LCS length calculation with respectively O(mn log log n/ log n) and O(mn/ log n+r) time complexity, the latter working for r = o(mn/(log n log log n)), where r is the number of matches in the ...

متن کامل

A simple algorithm for the constrained sequence problems

In this paper we address the constrained longest common subsequence problem. Given two sequences X , Y and a constrained sequence P , a sequence Z is a constrained longest common subsequence for X and Y with respect to P if Z is the longest subsequence of X and Y such that P is a subsequence of Z. Recently, Tsai [7] proposed an O(n ·m · r) time algorithm to solve this problem using dynamic prog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Algorithms

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2002